Data Structures & Algorithms Essentials

Internalize how data organization + algorithmic paradigms shape performance. Focus on mental models, complexity trade‑offs, and implementation patterns rather than rote memorization.

🧠 Big-O

🧵 Recursion

🗂️ Selection

🔍 Searching

🪜 Traversal

⚡ Optimization

1. Core Complexity Concepts

Asymptotic reasoning is a comparative tool, not a stopwatch.

Asymptotic Notation

O(f(n)) Upper bound (worst growth ceiling)
Ω(f(n)) Lower bound (best guarantee)
Θ(f(n)) Tight bound (sandwiched)
Little-o Strictly lower growth (f=o(g))
Amortized Average per operation over sequence

Growth Ladder (Typical)

O(1) → O(log n) → O(n) → O(n log n) → O(n²) → O(2^n) → O(n!)
Logarithms often appear with divide & conquer or tree height
n log n is balanced partition + linear combine cost

Space vs Time Trade-off

Caching / memoization: trade memory for fewer recomputations
Precomputation (prefix sums, sparse tables) speeds queries
Compression / streaming reduces memory at CPU cost

Data Structure Selection Heuristics

Need	Best Candidates	Avoid When	Notes
Fast membership test	Hash Set / Bloom Filter	Adversarial keys (hash attacks)	Bloom = probabilistic (false positives)
Sorted iteration	Balanced BST / Heap (partial)	Random-only access	Heaps only guarantee top element
Min/Max streaming	Heap / Monotonic Queue	Need arbitrary removal	Monotonic queue for sliding windows
FIFO multi-producer	Queue / Ring Buffer	Random deletions required	Ring buffer for fixed capacity
LRU cache	Hash Map + Doubly List	Massive key space w/ low reuse	O(1) get + put eviction

Optimization mindset: measure → hypothesize → change one variable → re‑measure. Choose structure by workload, not preference.

2. Core Data Structures

Understand shape, operations, and cost model.

Array

Contiguous memory, O(1) index, expensive middle inserts (O(n)). Great for iteration & cache locality.

O(1) accessresize cost

Linked List

Node chain w/ pointers. O(1) head insert; O(n) index; pointer overhead, poor cache locality.

O(1) headfragmented

Stack

LIFO: push/pop O(1). Backtracking, parsing, DFS recursion emulation. Array or linked list backed.

call frames

Queue

FIFO: enqueue/dequeue O(1). Scheduling, BFS. Use circular buffer to avoid shifts.

breadth

Hash Table

Average O(1) put/get; degrade to O(n) worst if collisions unmitigated. Good hashing & resizing crucial.

amortized

Binary Search Tree

Ordered structure. Average O(log n) ops; unbalanced = O(n). Self-balancing (AVL, Red-Black) maintain log height.

ordered

Heap (Binary)

Partial order: root min/max. Insert & extract O(log n). Useful for priority scheduling & streaming kth.

priority

Graph

Entities (nodes) + relations (edges). Choose adjacency list vs matrix based on density.

networks

Trie (Prefix Tree)

String keyed; O(L) lookup by length. Memory heavy; compress with radix / DAWG.

strings

3. Algorithmic Paradigms

Patterns > isolated tricks. Generalize transformations.

Sorting Spectrum

Comparison Lower Bound: Ω(n log n)
Bubbles / Selection / Insertion: O(n²), tiny arrays ok
Merge: stable, O(n log n), O(n) extra space
Quick: avg O(n log n), worst O(n²) pivot risk
Heap: O(n log n), in-place, not stable
Counting/Radix: digit/key domain assumptions

Searching Patterns

Linear scan unsorted O(n)
Binary search sorted O(log n)
Two pointers (sorted or partition problems)
Sliding window for subarray substring constraints
Binary search on answer (feasibility function)

Recursion & Divide

Split → Solve → Combine (merge sort)
Tail recursion optimization mental model
Backtracking: explore + undo (N-Queens, subsets)
Pruning reduces exponential blow-up

Dynamic Programming

Optimal substructure + overlapping subproblems
Top-down memo vs bottom-up table
State definition clarity > code golf
Space optimization (rolling arrays)

Greedy Choice

Local optimum leads to global optimum (exchange argument)
Counterexample search before committing
Activity selection, interval scheduling

Graph Traversal

BFS: shortest unweighted path
DFS: cycle detect, topological order (post-order)
Dijkstra: weighted non-negative
Union-Find: connectivity queries

Classic Recurrence Examples

// Merge sort: T(n) = 2T(n/2) + O(n) => O(n log n)
// Binary search: T(n) = T(n/2) + O(1) => O(log n)
// Tower of Hanoi: T(n) = 2T(n-1) + 1 => O(2^n)
// Master Theorem forms aid direct classification

5. Complexity Cheat Sheet

Big-O (average unless noted).

Structure	Access	Search	Insert	Delete
Array	O(1)	O(n)	O(n)	O(n)
Dynamic Array (amort.)	O(1)	O(n)	O(1)	O(n)
Singly Linked List	O(n)	O(n)	O(1)*	O(1)*
Hash Table	-	O(1)	O(1)	O(1)
BST (balanced)	O(log n)	O(log n)	O(log n)	O(log n)
Heap	O(n)	O(n)	O(log n)	O(log n)
Trie	O(L)	O(L)	O(L)	O(L)

*Assumes head reference (for insert/delete). Hash worst-case degrades to O(n) with poor distribution.

6. Review & Mastery Prep

Track your progress deliberately.

Progress Checklist

Big-O notation fundamentals Implement a dynamic array growth Balanced BST operations mental model Hash table collision strategies Graph traversal BFS vs DFS differences Sorting algorithm trade-offs DP state definition + transitions Greedy proof via exchange argument

Concept Q&A

Why is quicksort often faster than merge sort despite same O(n log n)?

In-place (better cache locality), lower constant factors, no extra O(n) memory. Pivot quality influences performance.

Difference between recursion and backtracking?

Backtracking is structured recursion where you explore a decision space, undo (backtrack) partial choices to search alternatives systematically.

When to use a heap over sorting entire data?

When you only need top k elements or a streaming running min/max. Reduces cost from sorting all n to maintaining size k (O(n log k)).

Dynamic programming vs memoization?

DP often refers to bottom-up table fill order; memoization is top-down caching. Both exploit overlapping subproblems; choice influenced by natural recurrence direction & need for full table.